DDIG-in: detecting disease-causing genetic variations due to frameshifting indels and nonsense mutations employing sequence and structural properties at nucleotide and protein levels

نویسندگان

  • Lukas Folkman
  • Yuedong Yang
  • Zhixiu Li
  • Bela Stantic
  • Abdul Sattar
  • Matthew E. Mort
  • David N. Cooper
  • Yunlong Liu
  • Yaoqi Zhou
چکیده

MOTIVATION Frameshifting (FS) indels and nonsense (NS) variants disrupt the protein-coding sequence downstream of the mutation site by changing the reading frame or introducing a premature termination codon, respectively. Despite such drastic changes to the protein sequence, FS indels and NS variants have been discovered in healthy individuals. How to discriminate disease-causing from neutral FS indels and NS variants is an understudied problem. RESULTS We have built a machine learning method called DDIG-in (FS) based on real human genetic variations from the Human Gene Mutation Database (inherited disease-causing) and the 1000 Genomes Project (GP) (putatively neutral). The method incorporates both sequence and predicted structural features and yields a robust performance by 10-fold cross-validation and independent tests on both FS indels and NS variants. We showed that human-derived NS variants and FS indels derived from animal orthologs can be effectively employed for independent testing of our method trained on human-derived FS indels. DDIG-in (FS) achieves a Matthews correlation coefficient (MCC) of 0.59, a sensitivity of 86%, and a specificity of 72% for FS indels. Application of DDIG-in (FS) to NS variants yields essentially the same performance (MCC of 0.43) as a method that was specifically trained for NS variants. DDIG-in (FS) was shown to make a significant improvement over existing techniques.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prediction of Protein-Destabilizing Polymorphisms by Manual Curation with Protein Structure

The relationship between sequence polymorphisms and human disease has been studied mostly in terms of effects of single nucleotide polymorphisms (SNPs) leading to single amino acid substitutions that change protein structure and function. However, less attention has been paid to more drastic sequence polymorphisms which cause premature termination of a protein's sequence or large changes, inser...

متن کامل

Investigation of Polymorphisms in Non-Coding Region of Human Mitochondrial DNA in 31 Iranian Hypertrophic Cardiomyopathy (HCM) Patients

The D-loop region is a hot spot for mitochondrial DNA (mtDNA) alterations, containing two hypervariable segments, HVS-I and HVS-II. In order to identify polymorphic sites and potential genetic background accounting for Hypertrophic CardioMyopathy (HCM) disease, the complete non-coding region of mtDNA from 31 unrelated HCM patients and 45 normal controls were sequenced. The sequences were aligne...

متن کامل

Translational suppressors and antisuppressors alter the efficiency of the Ty1 programmed translational frameshift.

Certain viruses, transposons, and cellular genes have evolved specific sequences that induce high levels of specific translational errors. Such "programmed misreading" can result in levels of frameshifting or nonsense codon readthrough that are up to 1,000-fold higher than normal. Here we determine how a number of mutations in yeast affect the programmed misreading used by the yeast Ty retrotra...

متن کامل

Genetic Variations in Exon 3 of VWF Gene in Patients with Von Willebrand Disease (VWD) from South-West Iran

Abstract Background Von Willebrand disease (VWD) is an autosomally inherited bleeding disorder with the prevalence of 1% based on population studies. The disease phenotype is due to quantitative and structural/functional defects in Von Willebrand Factor (VWF) which is a glycoprotein with essential role as a carrier of FVIII in circulation and also it serves the function as hemostasis regulato...

متن کامل

Study on Genetic Diversity of Terminal Fragment Sequence of Isolated Persian Tobacco Mosaic Virus

Tobacco mosaic virus (TMV) is one of the devastating plant viruses in the world that infects more than 200 plant species. Movement protein plays a supportive role in the movement of other plant viruses, and viral coat protein is highly expressed in infected plants and affects replication and movements of TMV. In order to investigate genetic variation in the terminal fragment sequence in Iranian...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 31 10  شماره 

صفحات  -

تاریخ انتشار 2015